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We describe a determination of the strong coupling a s (Mz) from scaling violations of the nonsinglet DIS 
structure function, which is based on two novel techniques aimed at controlling and minimizing the theoretical 
error: a neural network parametrization of BCDMS and NMC data, and QCD evolution by means of truncated 
Mellin moments. 


1. Introduction 

Scaling violations of DIS structure functions 
play a key role in our understanding of pertur¬ 
bative QCD, and their measurement is one of 
the most natural and potentially cleanest meth¬ 
ods available for the determination of the strong 
coupling j^]. There are, however, practical and 
conceptual difficulties that must be overcome to 
achieve with this method a precise determination 
of a s and its associated statistical and theoretical 
errors. 

The theoretical prediction for scaling viola¬ 
tions is given by Altarelli-Parisi evolution equa¬ 
tions |2;]. A clean, analytic solution at NLO in the 
coupling is available in terms of Mellin moments 
of structure functions, which diagonalize the evo¬ 
lution operator. The difficulty in the implementa¬ 
tion of this solution is the fact that moments are 
not directly measurable, since they are weighted 
integrals of the structure function over the inter¬ 
val 0 < x < 1, and x —> 0 implies y—> oo. 
The uncertainty on the value of any given mo¬ 
ment is thus unknown, and in principle infinite: 
it is necessary to rely upon extrapolations. An 
alternative, practical, if less elegant method of so¬ 


lution is to work numerically, directly with the x— 
space integro-differential equation. This involves, 
in principle, no extrapolation, since AP evolution 
is directional in x space, however it can be nu¬ 
merically challenging. Furthermore, in practice, 
this method usually requires the introduction of 
a parton parametrization, and this raises new is¬ 
sues: in fact, the choice of a parametrization is 
a source of theoretical bias, very difficult to as¬ 
sess, and errors on physical observables associated 
with the parametrization are notoriously difficult 
to evaluate [||. 

Our goal here is to present a data-driven de¬ 
termination of a s JJ, which tackles the problems 
outlined above. We use BCDMS |H) and NMC (fj 
data for the nonsinglet DIS structure function 
and we strive to minimize all sources of theoreti¬ 
cal bias and uncertainty, while accurately assess¬ 
ing the statistical effects of experimental errors 
and correlations. 

To this end, we employ two theoretical tools 
which have recently been developed. First, we 
solve AP equations by means of truncated Mellin 
moments Q; next, we make use of a bias-free 
parametrization of F 2 , constructed by means of 
neural networks ||. 
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Mellin moments over a truncated interval (cco < 
x < 1) are observable quantities; as shown in 
Ref. |Q, they obey a simple evolution equation, 
which can be approximated with arbitrary preci¬ 
sion by a matrix equation admitting a simple an¬ 
alytic solution. In principle, since one is manip¬ 
ulating observable quantities, a parametrization 
of the structure function is not needed. In prac¬ 
tice, however, data coverage and precision are not 
sufficient, and to extract from the data the max¬ 
imum amount of information (for example, in or¬ 
der to combine errors on neighboring data points 
in the best way) it is necessary to impose the con¬ 
straint of continuity on the structure function, i. e. 
to introduce a parametrization. To do so without 
introducing a theoretical bias, and keeping full 
control on the propagation of experimental errors 
and correlations, we make use of the results of 
Ref. where neural networks were employed in 
conjunction with Monte-Carlo techniques to con¬ 
struct a faithful representation of the probability 
distribution in the space of structure functions. 

2. Evolution with truncated moments 

Truncated Mellin moments of a parton distri¬ 
bution q(x,t ), with t = log n 2 f, are defined by 

q n (x 0 ,t)= f dx x n ~ 1 q(x,t) . (2.1) 

J Xq 

They satisfy the evolution equation 

^ = 7 ^ J dy y n ~ 1 q(y,t) G n j , (2.2) 

where 

G n (x,a s ) = f dzz n ~ 1 P(z,a s ) (2.3) 

J X 

is the truncated moment of the appropriate AP 
kernel P(z,a s ). 

As Xq —> 0, G n becomes the anomalous dimen¬ 
sion 7 n , and different moments evolve indepen¬ 
dently; for xo ^ 0, evolution couples the n-th 
truncated moment q n with all other qk, with k > 
n, as easily seen by Taylor expanding G n (xo/y) 
around y = 1. This Taylor expansion converges 
in the interval xq < y < 1, since G„ only has 


integrable singularities due to + distributions at 
y = x o- Truncating the expansion at the M-th 
term yields the linear system 

i M 

“Jr = (^(xo,a s ) q n+p (x 0 ,t) , (2.4) 

p=o 

with coefficients that can be computed analyti¬ 
cally to the order to which the splitting functions 
are known. 

Methods for the solution of Eq. ( |2.4| ) and their 
accuracy have been studied in Refs. |fr||j|JT(|. One 
may first of all observe that the matrix of anoma¬ 
lous dimensions governing the evolution of trun¬ 
cated moments is upper triangular. As a con¬ 
sequence, it can be diagonalized analytically by 
means of a simple recursion relation. A second 
observation is that moments with significantly 
different indices are weakly coupled for small xq. 
This is what justifies the truncation of the expan¬ 
sion of the r.h.s. of Eq. (|T^) at finite M. 

The convergence of the series of approximations 
for increasing M has been studied systematically. 
The convergence of the approximation as a func¬ 
tion of M is good (leading to a few percent er¬ 
ror for Mg20), except for lowest nonsingular mo¬ 
ments (sensitive to singularities at y = xq). An 
improved version of the method |l(J deals with 
this problem, so that in fact for all finite moments 
Mg 12 is sufficient to achieve an accuracy on evo¬ 
lution at the percent level. 

The method has also been extended to sin¬ 
glet and gluon distributions [^| , with minor tech¬ 
nical complications; a NLO analytic solution is 
available in all cases and can be efficiently imple¬ 
mented numerically (for details, see [III]]). It is 
worth emphasizing that, being based on Mellin 
moments, the method is also well suited to in¬ 
clude the effects of threshold logarithms, which 
may in fact play a non-negligible role in the de¬ 
termination of a s . 

3. Neural network parametrization of Pi 

The standard procedure for fitting structure 
functions, as well as parton distributions, is to 
choose a simple functional form with enough free 
parameters, and then fix the value of the param¬ 
eters by minimizing a suitable y. This proce- 
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dure leads to well known difficulties in determin¬ 
ing errors to be associated with generic observ¬ 
ables depending on the fitted functions. First of 
all, the computation of errors and correlations 
of the fit parameters requires at least a fully 
correlated analysis of experimental errors, which 
is not always available, and may be very diffi¬ 
cult when data from different experiments are in¬ 
volved. Furthermore, standard error propagation 
methods are not generally applicable: many ob¬ 
servables are strongly nonlinear functionals of the 
fit parameters, and they often depend on nonlocal 
features of the fitted functions. Finally, as men¬ 
tioned above, the theoretical bias due to choice 
of parametrization is difficult to assess, and its 
effects can be large if data are not precise, as it 
happens for example in the case of polarized dis¬ 
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To solve these problems, what is needed [|13J is 
a reliable representation of the probability mea¬ 
sure V{F 2 ) in the space of structure functions 
F 2 (x,Q 2 ). Then, for any observable functional 
<3(F 2 ). one could compute 


(g [F 2 ] } = JvF 2 g [F 2 (x,Q 2 )\ V(F 2 ), (3.5) 

and similarly for higher moments. Just such a 
representation was constructed, using a combina¬ 
tion of Monte-Carlo techniques and neural ner- 
works, in Ref. jsj]. 

Neural networks are a class of algorithms pro¬ 
viding robust, universal and unbiased approxi- 
mants to incomplete or noisy data. As such, they 
are ideally suited to handle problems like the ones 
discussed above, which center on the need to re¬ 
construct a continuous function from a discrete 
set of data with errors. 

Space prevents us from discussing here in any 
detail the technology of neural networks, their 
properties and applications (for an introduction, 
see for example Ref. [^4|). For the record, the 
neural architecture implemented in Ref. |Q is that 
of a multilayer feed-forward network or “percep- 
tron”. This means that network nodes (neurons) 
are arranged in an ordered sequence of layers, 
and each neuron receives input from the neu¬ 
rons in the preceding layer, while feeding out¬ 
put to those in the successive layer. The net¬ 


work learns to interpolate the chosen set of data 
by the method of supervised training by back- 
propagation. In practice, the network attempts 
matching data to output, then proceeds to vary 
the parameters characterizing neuron activation 
(weights and thresholds), along the steepest de¬ 
scent contour, searching for the minimum of the 
chosen error function. 

The only assumption made by the network al¬ 
gorithm is that the data can be interpolated by 
a smooth function. All the parameters character¬ 
izing the network, such as size (number of neu¬ 
rons), architecture (structure of layers), length of 
the learning cycle, can be decided based on purely 
statistical criteria. Specifically, no assumption is 
made on the functional form of the interpolating 
function. 

The parametrization of the nonsinglet struc¬ 
ture function F 2 NS \x, Q 2 ) constructed in Ref. j|] 
is based on a total of 552 data points collected 
by the NMC and BCDMS collaboration. The 
method used is a combination of Monte-Carlo 
techniques with neural network technology, and 
consists of two steps. The first step is the Monte- 
Carlo generation of an ensemble of N rep pseudo¬ 
data sets, mimicking the real data, with the cor¬ 
rect multivariate distribution given by experi¬ 
mental errors, fully correlated. Let i = {x,Q 2 } 
be a point where F 2 NS ^ has been measured, ob¬ 
taining the result F,^ exp \ with statistical error 
normalization error cr.;^ and percentage sys¬ 
tematic errors fi^ a . Then one generates the N lep 
pseudo-data at point i according to 


(3.6) 


where k = 1,..., N lep , and r- ^, and are 
univariate gaussian random numbers, which for 
systematic errors are grouped in classes according 
to experimental correlations. 

The second step is the training of N rep neural 
networks, each one using one pseudo-data set. 
At the end of training, the parameters of each 
network are optimized for the interpolation of the 
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corresponding data set; the output of the process 
is a set of N lep continuous functions F^l(x,Q 2 ) 
representing a faithful sample of the probability 
distribution V(F 2 ). 

Finally, one may evaluate averages, errors and 
correlations of observables using the N rep net¬ 
works as a Monte-Carlo representation of V{F 2 ). 
In this context, a s can be treated like any func¬ 
tional of the structure function. 

4. Fit architecture and parameters 

Having at our disposal the ensemble of struc¬ 
ture functions needed for the analysis, there are 
still a number of parameters that may be cho¬ 
sen in the fitting procedure to determine a s , to 
maximize the accuracy of the results. First of all, 
we must choose the truncation point Xq entering 
the definition of truncated moments, Eq. (PI), as 
well as the fitting range in Q 2 . The criteria for 
these choices are simple: we must take maximal 
advantage of data coverage, in order to minimize 
statistical errors on individual moments; we must 
impose a lower cut on Q 2 to keep power correc¬ 
tion under control; we must keep Xq as small as 
possible, compatibly with data coverage, to en¬ 
sure a fast convergence of our approximate evo¬ 
lution equation. We also have a choice regard¬ 
ing the number of intermediate scales to be used 
as evolution targets: in principle we could apply 
the evolution equation between any two scales in 
the fitting range, however introducing too many 
intermediate scales would clearly not add new 
information, and it would lead to large correla¬ 
tions between neighboring moments. Based on 
these criteria and extensive testing, we choose 
xq = 0.03 for the truncation point, 20 Gev 2 < 
Q 2 < 70 Gev 2 for the fitting range, and n sc = 3 
for the number of scales, which are taken to be 
equally spaced on a logarithmic scale. 

A second set of choices to be made concerns our 
approximate evolution equation. We work with a 
NLO evolution equation with matching at heavy 
quark thresholds according to the Marciano pre¬ 
scription ||l5f ; we must then decide how many 
truncated moments should be included in the evo¬ 
lution to achieve a satisfactory accuracy while 
preserving numerical stability. An optimal choice 


is to include truncated moments with 1 < n < 11, 


i.e. to set M = 11 in Eq. (T2). Since we are in¬ 
cluding the lowest nonsingular moment, we must 
use the improved method of solution described 
in Ref. |lfj; the auxiliary parameter introduced 
there is set to N = 6. We performed several tests 
on our approximate solution with these param¬ 
eters, and found that the accuracy achieved on 
evolution in the relevant range is at the level of 


0.1%. 

A final important issue is the number of mo¬ 
ments that should be treated as fit parameters 
together with the strong coupling. In fact, the 
values of truncated moments at the target scales 
of perturbative evolution depend both on the cou¬ 
pling and on the values of the same moments at 
the initial scale. These initial values can be ei¬ 
ther fixed at the central experimental value, or 
fitted. It turns out that to achieve a satisfactory 
precision, given the statistical errors on truncated 
moments, the number of fitted moments must be 
?Zfit > 3. On the other hand, fitting a large num¬ 
ber of moments, especially successive ones, en¬ 
dangers the numerical stability of the fit due to 
the very large correlations between neighboring 
moments. This forces nm < 6. After testing 
the possible combinations, we find that the choice 
that minimizes errors while maintaining numeri¬ 
cal stability is to fit moments n = 2,4, 5, 6,8. 

We emphasize that all fit parameters have been 
varied within their respective windows of stability 
with negligible effects on the results. 


5. Result and errors 

As shown in Ref. Q, an ensemble of N lep = 
1000 networks is sufficient to reproduce correctly 
all experimental errors and correlations, without 
introducing any bias. With such an ensemble, 
and the fit architecture outlined above, our result 
with statistical errors is 

a s (M z ) = 0.124 1 ( stat .) . (5.7) 

Theoretical uncertainties remain to be estimated. 
A first source is the presence of power correction, 
which are not accounted for by a perturbative 
treatment. They can be of a kinematical nature 
(target mass corrections), or dynamical (higher 
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twist corrections), or due to elastic contributions 
at x = 1. As discussed in Ref. [Q, all are found 
to be negligible (< 1%) thanks to our choice of 
Q 2 range. A second source of theoretical error 
is given by the uncalculated NNLO and higher 
perturbative contributions to the evolution equa¬ 
tion. These can be estimated by varying the 
renormalization scale according to /r 2 en = k ren Q 2 
(note that there is no factorization scale depen¬ 
dence in DIS scheme). We have tested the range 
0.3 < k ren < 4, and found that the ensuing uncer¬ 
tainty is not negligible, indicating sizeable NNLO 
corrections: <7 ren = 1 0004■ It is conceivable, and 
it can be tested, that our method might be af¬ 
fected by an enhancement of threshold logarithm 
effects, since the fitting procedure involves rel¬ 
atively high Mellin moments. The inclusion of 
such logarithms in a resummed fit should be natu¬ 
ral with our formalism, since resummation is per¬ 
formed in Mellin space, where our evolution takes 
place. A final possible source of theoretical error 
is the location of heavy quark thresholds. This 
is estimated by varying the threshold position as 
Q 2 h = k t h,M 2 , with 0.3 < k t h < 4. The effect is 
expected to be nearly negligible, since only the b 
threshold is included in our Q 2 range, and only 
for some kth■ We find in fact that oth = i 0 002 - 

Summarizing, our final result for the strong 
coupling reads 

<*s{M z ) = 0.124 ± (exp.) + (th.).(5.8) 

We observe that the error is dominated by statis¬ 
tical uncertainties, consistently with our expecta¬ 
tions. The central value turns out to be some¬ 
what on the high side of the current world aver¬ 
age, though well within errors. We note that this 
is consistent with the possibility that threshold 
logarithms might affect our determination more 
than others, since it is known that their leading 
effect is to replace the argument of the coupling, 
changing Q 2 to Q 2 /n for the n-th moment, thus 
leading to a larger value for the effective coupling. 

Resummation of soft gluon effects will prob¬ 
ably lead to a further reduction of the theo¬ 
retical error, which is dominated by unknown 
higher order corrections. On the experimental 
side, the result could be significantly improved ei¬ 
ther with better statistical accuracy (particularly 


of deuteron data), or by extending the range in 
Q 2 , which might be achieved by the planned EIC 
facility (l(|. 
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